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Once material has been learned to a criterion of one perfect trial, further study within the same session 
constitutes overlearning. Although overlearning is a popular learning strategy, its effect on long-term 
retention is unclear. In two experiments presented here, 218 college students learned geography facts 
(Experiment 1) or word definitions (Experiment 2). The degree of learning was manipulated and measured 
via multiple test-with-feedback trials, and participants returned for a final cued recall test between one and 
nine weeks later. The overlearners recalled far more than the low learners at the one-week test, but this 
difference decreased dramatically thereafter. These data suggest that overlearning (and its concomitant 
demand for additional study time) is an inefficient strategy for learning material for meaningfully long 
periods of time. 



Study duration may be the most frequently 
manipulated variable in tbe field of memory. The 
ubiquity of this manipulation probably reflects its 
theoretical importance, but its practical implications 
are arguably even greater. Of particular concern is the 
relationship between study session duration and long- 
term retention. Eor example, after a student has learned 
the definitions for each of ten vocabulary words, what 
are the benefits of continued study of the same 
material? 

The immediate continuation of practice beyond 
the criterion of one perfect instance is defined as 
overlearning. Thus, if criterion is reached but further 
study is delayed until a subsequent session, the post- 
criterion practice is not an instance of overlearning. 
Another point to be clarified is the distinction between 
overlearning and the ultimate degree of mastery. As 
defined above, material is overlearned if it is learned 
with an overlearning strategy, regardless of how well 
the material was learned. Eikewise, material can be 
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extremely well learned without the use of an 
overlearning strategy. Eor example, virtually everyone 
has mastered the names of the 12 calendar months, but 
few have accomplished this by overlearning (i.e., a 
single study session with post-criterion learning). It is 
the strategy of overlearning that is assessed in the 
present studies rather than the utility of mastery per se 
(which, incidentally, does appear to boost long-term 
retention, e.g., Conway, Cohen, & Stanhope, 1992). 

Overlearning occurs frequently in education 
and training, perhaps because it is commonly cited as a 
useful strategy in review chapters and textbooks 
regarding education and training (e.g., Aamodt, 1999; 
Eitts, 1965; Eoriska, 1993; Ragman & Rose, 1983; 
Spec tor, 2000). Overlearning is generally described as 
a means of ensuring long-term retention. As Eoriska 
(1993) explained, “The information or skill to be 
learned is finally moved from short-term memory to 
long-term memory by overlearning tbe information . . .” 
(p. 40). Similarly, Hall (1989) wrote, “The 

overlearning effect would appear to have considerable 
practical value since continued practice on material 
already learned to a point of mastery can take place 
with a minimum of effort, and yet will prevent 
significant losses in retention” (p. 328). Eitts (1965) 
concluded, “The importance of continuing practice 
beyond the point in time where some (often arbitrary) 
criterion is reached cannot be overemphasized” (p. 
195). In summary, overlearning is widely advocated, 




and this advice is entirely consistent with empirical 
studies of overlearning. 

The Empirical Literature 

In the majority of previous overlearning 
studies, the data show that overlearning leads to greater 
recall than lesser degrees of learning. One such finding 
was presented hy Krueger (1929), which may he the 
most frequently cited paper in this area. In that study, 
participants completed multiple learning trials with the 
same list of words, and the overlearning condition 
included twice as many learning trials as the control 
condition. This margin of overlearning is commonly 
referred to as 100% overlearning. When participants 
were tested between 1 and 28 days later, the 
overlearned material was better recalled than the non- 
overlearned material. This finding has since been 
replicated many times with varying procedures and 
different kinds of materials (e.g., Bromage & Mayer, 
1986; Craig, Sternthal, & Olshan, 1972; Earhard, 
Fried, & Carlson, 1972; Gilbert, 1957; Nelson, 
Leonesio, Shimamura, Landwehr, & Narens, 1982, 
Postman, 1962; Richardson, 1973; Rose, 1992). By 
contrast, we know of one study that revealed virtually 
no benefit of overlearning (Kratochwill, Demuth, & 
Conzemius, 1977). 

A similar account of the literature is given by a 
meta-analysis by Driskell, Willis, and Cooper (1992). 
These authors examined 11 overlearning studies with 
cognitive tasks, and they found that the effect of 
overlearning on a subsequent test was moderate in size 
{d - .753). In brief, then, the results of previous 
overlearning studies have shown that overlearning can 
boost subsequent retention. 

However, these studies leave open the 
possibility that the benefits of overlearning may not be 
long lasting. Of the 11 studies in the Driskell et al. 
meta-analysis, for instance, a majority relied solely on 
retention intervals of one week or less, and none 
included a retention interval greater than four weeks. 
Moreover, Driskell et al. observed that the difference 
between the overleamers and low learners was smaller 
at longer retention intervals (i.e., retention interval was 
a moderator). However, this finding is a correlation 
and therefore subject to alternative explanations. For 
instance, most of the 11 studies relied on a single 
retention interval, which means that this correlation 
may have partly reflected procedural differences 
between the studies with short retention intervals and 
the studies with longer retention intervals. 
Nevertheless, the results of the Driskell et al. meta- 
analysis are consistent with the possibility that the 
benefits of overlearning may dissipate at longer 
retention intervals. 

In summary, it seems clear that overlearning 
can positively affect subsequent performance, but little 



is known about how this relationship is modulated by 
retention interval. An understanding of this relationship 
is essential to making rational decisions about the 
optimum degree of study when long-term retention is 
the goal. Toward this aim, the present studies included 
manipulations of learning level and retention interval 
(RI), and each study included a retention interval of at 
least four weeks. 

Criterion- vs. Duration-Based Procedures 

In overlearning studies, the degree of learning 
is manipulated by one of two procedures. In the 
criterion-based procedure, participants study or 
practice until they reach a criterion of one perfect trial 
before stopping or continuing. Thus, this method 
requires that participants’ performance be measured 
during the learning session. For example, the learning 
of paired associates (e.g., Talara-Peru) is typically 
assessed by the use of multiple trials with test items 
(Talara - ?) followed immediately by feedback (Peru). 

In the second type of procedure, the duration 
of study (or the number of learning trials) is 
predetermined for each degree of learning. For 
example, the overlearning condition might include 20 
learning trials, whereas the control condition might 
include only 10 trials. This technique ideally includes a 
measure of performance during the learning session in 
order to determine, for example, whether the 
participants in the overlearning condition actually 
reached and surpassed criterion. 

Each procedure has its advantages and 
disadvantages. The criterion-based procedure ensures 
the precise level of learning for all participants, but the 
analyses are complicated by the variability in the study 
duration within a condition (as some participants reach 
criterion more quickly than others). The duration-based 
procedure ensures an equal number of learning trials 
(or equal amount of study time) for all participants in 
the same condition, but it can be difficult for 
researchers to choose study durations that produce the 
desired degree of learning. Of the two procedures, the 
duration-based procedure has been used slightly more 
often than the criterion-based procedure. 

Overview of Experiments 

The two experiments presented here rely on 
the duration-based procedure, and we assessed 
performance during the learning session of each 
experiment. In addition, we conducted secondary 
analyses that incorporated the criterion-based 
definition of overlearning. These analyses essentially 
excluded participants who achieved too few or too 
many perfect trials. 

In Experiment 1, participants studied paired 
associates (city-country pairs), and the number of 
learning trials was manipulated. In Experiment 2, 
participants studied paired associates (word-definition 




pairs), and the number of study words (but not total 
study duration) was varied, which effectively 
manipulated the amount of study time per word. In 
both experiments, retention interval was manipulated 
as well. 

Experiment 1 

The first study examined the effects of 
overlearning paired associates on a subsequent test. 
College students learned 10 city-country pairs (e.g., 
Talara-Peru) by test-with-feedback trials, with each 
trial including all 10 pairs. The number of learning 
trials equaled 5 or 20 (Lo vs. Hi Learning), and 
participants returned for a test after one, three, or nine 
weeks. 

Method 

Participants. The sample included 130 
undergraduates at the University of South Florida. An 
additional eight students completed the learning 
session but failed to show for the test. 

Design. Learning Level (Lo, Hi) and Retention 
Interval (1, 3, 9 weeks) were between-subjects 
variables, and each participant was assigned randomly 
to one of the six groups. 

Materials. Participants studied the 10 city- 
country pairs (e.g., Talara-Peru) listed in the Appendix. 
We used real world knowledge rather than random 
word pairings, because we believed that participants 
would find the task more interesting and therefore 
exhibit greater motivation. In order to assess whether 
the participants may have known these city-country 
pairs prior to the experiment, we surveyed a different 
sample of 50 participants from the same population. 
They were asked to supply the country for each city 
(e.g., Talara-?) and were informed that each country 
name included five or fewer letters. Their accuracy 
averaged 1.4% (i.e., 7 of 500). 

Procedure. During the learning session, each 
participant received a booklet with a different page 
devoted to each trial, with each trial including all 10 
items. On the first trial, participants simply saw a list 
of 10 city-country pairs for one minute. Immediately 
afterwards, the Lo and Hi Learners completed 5 or 20 
test-with-feedback trials, respectively. For each of 
these one-minute trials, the booklet page included a 
column of the 10 cities. Participants were asked to 
write each city’s corresponding country on a horizontal 
line immediately to the right of each city name. The 
order of the cities varied randomly across trials to 
ensure that participants were not merely learning serial 
position. Handwriting time was minimized by the use 
of countries with five or fewer letters and by 
encouraging participants to write in cursive. After 50 
seconds, participants were prompted to unfold the right 
side of the page, which revealed each correct answer in 
a location directly to the right of the corresponding 



written response. Participants then studied for the 
remaining 10 seconds of the trial before turning the 
page and beginning the next trial. After the fifth test- 
with-feedback trial, all participants ceased studying 
and the Lo Learners departed the room. A few minutes 
later, the Hi Learners completed the remainder of their 
20 learning trials. 

One, three, or nine weeks later, participants 
returned for the test. Each received a page listing the 
10 cities and was asked to recall the corresponding 
countries in three minutes. 

Results and Discussion 

Learning Session. Virtually all Hi Learners 
but only a minority of Lo Learners produced more than 
one perfect learning trial. Specifically, 58 of the 63 Hi 
Learners produced at least 3 perfect trials, and this 
subgroup was dubbed the True Hi Learners. By 
contrast, 44 of the 67 Lo Learners produced no more 
than one perfect trial, and this subgroup was dubbed 
the True Lo Learners. All subsequent analyses were 
performed twice: once for the Hi vs. Lo Learners and 
once for the True Hi vs. True Lo Learners. 

Accuracy during the learning session improved 
with successive trials, of course, as illustrated in Figure 
lA. The Hi Learners’ and Lo Learners’ curves coincide 
during the first five trials, because these two groups 
undergo identical procedures during this period. By 
contrast, there is no such overlap between the True Hi 
and True Lo Learners despite identical procedures 
during the first five trials, because these two groups 
were not selected randomly. That is, the True Hi 
Learners represent the best performing Hi Learners, 
and the True Lo Learners comprise the worst 
performing Lo Learners. Incidentally, the dip in the 
curve after trial five for both the Hi and True Hi 
Learners reflects a brief rest period, as described in the 
procedure section. 

Figure lA also illustrates that both the Hi and 
True Hi Learners neared the ceiling well before the end 
of the learning session, whereas their counterparts 
failed to reach asymptote. On the last learning trial, the 
Hi Learners averaged 97%, and the Lo Learners 
averaged 85%, F (1, 128) = 15.53, p < .001. Likewise, 
the True Hi Learners averaged 99% on their last 
learning trial, while the True Lo Learners averaged 
only 77%,E(1, 100) = 68.729, p < .001. 

Test Session. Test performance is shown in 
Figure IB. The Hi Learners recalled more than the Lo 
Learners, as indicated by a significant main effect of 
learning level given by a two-factor analysis of 
variance, F (1, 124) = 33.29, p < .001. Likewise, the 
main effect of retention interval was also reliable, F (2, 
124) = 34.78, p < .001. More importantly, the 
difference between the Hi and Lo Learners declined 
with retention interval, as evidenced by a significant 




interaction between learning level and retention 
interval, F (2, 124) = 8.65, p < .001. This declining 
difference between the Hi and Lo Learners was further 
evidenced by Holm-Sidak multiple comparison tests 
(with alpha = .05) showing that the Hi and Lo Learners 
differed significantly at week one {t - 6.85) and week 
three {t - 2.03) but not week nine {t - 1.26). 

This convergence of retention curves was also 
observed for the True Hi and True Lo Learners, as 
shown in Figure IB. Specifically, a two-factor analysis 
of variance revealed statistical significance for learning 
level, F (1, 96) = 37.19, p < .001, retention interval, F 
(2, 96) = 34.26, p < .001, and the interaction, F (2, 96) 
= 10.95, p < .001. The Holm-Sidak multiple 
comparison tests revealed significance differences 
between Hi and Lo Learners at week one {t - 7.82) but 
not week three {t - 1.67) or week nine {t - 1.58). 

Finally, we assessed whether the interaction of 
retention curves shown in Figure IB may partly reflect 
a floor effect on the Lo Learners at the longer retention 
intervals. To examine this possibility, we tabulated the 
number of participants who recalled zero items at test. 
Although zero scores were rare and equally frequent 
among Hi and Lo Learners at weeks one and three, 
there were more zero scores among the Lo Learners (6 
of 23) than Hi Learners (2 of 20) at week nine. To 
examine whether these zero scores contributed to the 
findings described above, we analyzed the test data 
shown in Figure IB without the data from the nine- 
week test. Specifically, a two-factor analysis of 
variance revealed statistical significance for learning 
level, ^(1, 83) = 31.56, p < .001, retention interval, 
^(1, 83) = 38.81, p < .001, and the interaction between 
learning level and retention interval, ^(1, 83) = 8.58, p 
< .01. Likewise, when the same analysis was restricted 
to the True Hi and True Lo Learners, statistical 
significance was again observed for learning level, ^(1, 
66) = 32.78, p < .001, retention interval, ^(1, 66) = 
37.81, p < .001, and their interaction, F(l, 66) = 12.02, 

p < .001. 

Summary. The Hi Learners recalled more than 
the Lo Learners at each retention interval, but the Hi 
Learners’ retention declined at a greater rate than the 
Lo Learners’ retention, as indicated by the interaction 
in Figure IB. Moreover, the Hi Learners’ retention 
declined by a greater proportion as well. Specifically, 
the Hi Learners’ retention declined by about two-thirds 
(from 70% at week one to 24% at week nine), while 
the Lo Learners’ retention declined by less than one 
half during the same period (from 31% at week one to 
17% at week nine). 

A similar decline in overlearning benefits 
emerged in the analyses restricted to the higher 
achieving Hi Learners (or True Hi Learners) and the 
lesser achieving Lo Learners (or True Lo Learners). 



Although these two subgroups probably differed in 
their ability and motivation, this confound presumably 
worked against the difference between the two groups 
(by magnifying the observed difference between the 
two groups). Also, a majority of both the Lo and True 
Lo Learners failed to achieve even one perfect learning 
trial, which is typically the criterion for the control 
condition in overlearning studies. Thus, the inclusion 
of these “less-than-criterion learners” presumably led 
to an observed difference that further overestimated the 
true benefit of studying after reaching criterion. 

Experiment 2 

Whereas the first experiment included a 
manipulation of total study time, the second 
experiment induced overlearning by varying the 
amount of learning material (and not the total study 
time). Specifically, the Lo Learners studied 20 word- 
definition pairs, and the Hi Learners were given the 
same amount of time to learn just 10 word-definition 
pairs. Thus, the Hi Learners received twice as much 
study time per item, and this was enough to produce 
overlearning. The learning session was again 
comprised of test-with-feedback trials, and participants 
were tested either one or four weeks after learning. 
Method 

Participants. A total of 88 USF 
undergraduates completed the experiment. An 
additional 13 students completed the learning session 
but did not return for the test. 

Design. Learning Level (Lo or Hi) and 
Retention Interval (1 or 4 weeks) were between- 
subjects variables, and each participant was randomly 
assigned to one of the four groups. 

Materials. Participants studied the word- 
definition pairs (e.g., cicatrix-scar) listed in the 
Appendix. Each definition was a single word 
comprised of four or fewer letters in order to minimize 
participants’ writing time. The Lo Learners studied all 
20 pairs, and the Hi Learners studied a random subset 
of 10 pairs. The base rate knowledge of these 
vocabulary words was assessed through a survey of the 
50 different USF students drawn from the same 
sample. None was able to provide a correct definition 
or synonym for any of the words. 

Procedure. The procedure was identical to that 
of Experiment 1 with the following exceptions. The 
initial study-only trial was two minutes. All 
participants completed 20 test-with-feedback “sub- 
trials.” Each sub-trial lasted 30 s and included five 
words. Eor the Hi Eearners (who studied the 10-word 
list), the 20 sub-trials were grouped into 10 trials, with 
each trial spanning two consecutive five-word sub- 
trials so that each word appeared once in every trial. 
Eikewise, the Eo Eearners (who studied the 20-word 
list) completed 5 trials, with each trial spanning four 




sub-trials so that each word appeared just once in every 
trial. The order of the words varied across trials. 

One or four weeks later, participants returned 
for the test. They were given four minutes to provide 
the one-word definition for each study word (e.g., 
cicatrix - ?). 

Results and Discussion 

Learning Session. Most Hi Learners and 
virtually no Lo Learners produced more than one 
perfect trial. Specifically, 40 of the 46 Hi Learners 
achieved at least three perfect trials, and this subgroup 
was dubbed the True Hi Learners. By contrast, 41 of 
the 42 Lo Learners produced no more than one perfect 
trial, and this subgroup was dubbed the True Lo 
Learners. Consequently, the True Hi and True Lo 
Learners included all but 7 of 88 participants, thereby 
ensuring that the differences between the Hi and Lo 
Learners were virtually identical to the differences 
between the True Hi and True Lo Learners. 

Across learning trials, the Hi Learners 
performed much better than the Lo Learners, as shown 
in Figure 2A. On the last trial, the Hi Learners 
averaged 97%, and the Lo Learners averaged 70%, F 
(1, 86) = 50.12, p < .001. Likewise, the True Hi 
Learners averaged 98% on their last learning trial, 
compared to 70% for the True Lo Learners, F (I, 79) = 
54.59, p<. 001. 

Test Session. The Hi Learners, who received 
twice as much study time per word as the Lo Learners, 
recalled a greater proportion of study words on the 
one-week test. But this difference disappeared by week 
four, as shown in Figure 2B. A two-factor analysis of 
variance revealed a main effect of Learning Level, F 
(1, 84) = 12.21, p < .001, a main effect of RI, F (1, 84) 
= 49.54, p < .001, and a reliable interaction between 
learning level and RI, F (1, 84) = 6.15, p < .02. This 
convergence of retention curves was again illustrated 
by Holm-Sidak multiple comparison tests (with alpha 
= .05) showing that the Hi and Lo Learners differed 
significantly at week one (t - 4.42) but not at week 
four (t - 0.69). 

The same pattern emerged when the analyses 
were restricted to the True Hi and True Lo Learners, as 
shown in Figure 2B. This is not surprising, because 
these two groups included virtually the same 
participants as the Hi and Lo Learners. The statistical 
analyses produced the same findings: a main effect of 
learning level, F (1, 77) = 10.20, p < .002, a main 
effect of RI, F (1, 77) = 39.76, p < .001, and an 
interaction, F (1, 77) = 4.07, p < .05. Likewise, the 
Holm-Sidak tests revealed significance differences 
between Hi and Lo Learners at week one (t - 3.86) but 
not at week four (t - 0.80). 

The convergence of the retention curves shown 
in both panels of Figure 2B raises the possibility that 



this interaction is due to a floor effect that 
disproportionately affects the Lo Learners at the long 
retention interval. Although it is difficult to strictly rule 
out this possibility, this rival hypothesis is at odds with 
an analysis of the individual data. Specifically, at the 
long retention interval, the number of Lo Learners who 
recalled zero items was relatively small (2 of 20) and 
actually less than the number of Hi Learners who 
recalled nothing (5 of 20). Incidentally, there were no 
test scores of zero at the short retention interval. 

Although the Hi and Lo Learners recalled the 
same proportion of words on the 4-week test, the Lo 
Learners recalled a greater absolute number of words 
at both retention intervals. The conversion from 
proportions to absolute totals is straightforward. For 
the Hi Learners, who studied 10 words, their one- and 
four-week proportions of 64% and 22% convert to 6.4 
and 2.2 words, respectively. For the Lo Learners, who 
studied 20 words, their one- and four-week proportions 
of 38% and 18% convert to 7.5 and 3.5 words (after 
rounding). Thus, the Lo Learners recalled more words, 
on average, than the Hi Learners at both the 1-week RI 
(7.5 vs. 6.4) and the 4-week RI (3.5 vs. 2.2). Hence, 
the margin of difference equaled 1.3 words at each RI 
(coincidentally), meaning that there was no interaction 
like that observed with the proportional measures 
(Figure 2B). However, significance was obtained for 
Learning Level, F (1, 84) = 4.14, p < .05, and retention 
interval, F (1, 84) = 48.19, p < .001. The same pattern 
was observed for the absolute test scores of the True Hi 
and True Lo Learners: a main effect of learning level, 
F (1, 77) = 4.27, p < .05, a main effect of RI, F (1, 77) 
= 38.26, p < .001, and no interaction, F (1, 77) < 1. 

Summary. Participants either overlearned a 
10-word list or underleamed a 20-word list. 
Overlearning boosted the chances of each item being 
recalled one week after learning, but no such benefit 
remained at week four. In addition to this convergence 
of retention curves, the Hi Learners forgot at a greater 
proportional rate. Specifically, the Hi Learners’ 
retention dropped by about two thirds (from 64% at 
week one to 22% at week four), while the Lo Learners’ 
retention dropped by about one half during the same 
time period (from 38% to 18%). Finally, with respect 
to the total number of words recalled, the 
underlearning of 20 words proved superior to the 
overlearning of 10 words at both retention intervals. 

General Discussion 

We presented two experiments that assessed 
the effect of overlearning on the long-term retention of 
paired associate learning. In Experiment 1, college 
students learned 10 paired associates, with one group 
of participants studying about four times as much as 
the other. This additional study effort produced 
overlearning, as evidenced by multiple perfect learning 




trials, which in turn hoosted their subsequent test 
performance heyond that of their more moderately 
learning cohorts. Yet the size of the benefit declined 
sharply within several weeks of the learning session, as 
shown in Figure IB. In Experiment 2, college students 
studied 10 or 20 paired associates, with the same study 
time allotted for either list. The 10-item group was able 
to achieve overlearning, as evidenced by multiple 
perfect learning trials, while the 20-item group was 
unable to reach a criterion of one perfect trial. The 
overlearned 10 items were more often recalled than the 
poorly learned 20 items when participants were tested 
one week after learning, but this benefit disappeared by 
four weeks, as shown in Figure 2B. Hence, in both 
experiments, overlearning boosted subsequent cued 
recall performance in comparison to a lesser degree of 
learning, but this benefit dissipated within several 
weeks. 

The Drawbacks of Overlearning 

These findings have implications for anyone 
who wants to retain information for at least several 
weeks, because the data cast doubt on the efficiency of 
overlearning as a strategy for achieving long-term 
retention of paired associates. Thus, these data are at 
odds with the pedagogical and empirical literature cited 
in the introduction. 

The question that arises, then, is why most 
previous overlearning studies have found benefits of 
overlearning that exceed those observed in the present 
studies. The discrepancy appears to be at least partly 
due to three procedural differences. First, most 
previous overlearning studies relied on retention 
intervals that were far shorter than those in the present 
studies. For instance, of the papers included in the 
Driskell et al. (1992) meta-analysis described in the 
introduction, 9 of 1 1 relied solely on retention intervals 
of one week or less. That these studies would find 
strong support for overlearning is neither surprising nor 
inconsistent with the results presented here. Second, 
the majority of previous overlearning studies included 
a single retention interval, and this prevents the 
possibility of observing the declining benefits of 
overlearning that is indicated by interactions like those 
shown in Figures IB and 2B. In Driskell et al., 7 of the 
11 papers included only one retention interval. Third, 
and less significantly, a small number of purported 
overlearning studies required participants to delay their 
post-criterion study to a second session on a later day, 
whereas overlearning is usually defined as immediate 
post-criterion study. By delaying their post-criterion 
practice, these learners were able to exploit the benefit 
of the spacing effect, which is described in greater 
detail further below. This was the case in two of the 
studies in the Driskell et al. meta-analysis (Ausubel & 
Youssef, 1965; Ausubel, Stager, & Gaite, 1968), and 



each of these studied boosted the overall effect size 
given by the meta- analysis. 

The Benefits of Overlearning 

Although an overlearning strategy may have 
limitations, we should emphasize that this learning 
strategy may be advisable in certain instances. For 
example, because overlearning proved extremely 
beneficial at the shortest retention intervals in the 
present studies, it might be ideal for learners who seek 
only short-term retention. For instance, a student with 
an exam later in the day might benefit from 
overlearning, as would anyone trying to amass foreign 
language vocabulary just before a brief trip. 
Furthermore, overlearning would be appropriate when 
there are dire consequences of forgetting. For example, 
if an employee must know certain safety procedures, 
overlearning might be advised. Similarly, Schendel and 
Hagman (1982) advocate an overlearning strategy for 
soldiers learning disassembly and assembly of machine 
guns, and the present results would not challenge that 
advice. 

A final set of caveats concern the potentially 
limited generality of the present results. In 

Experiments 1 and 2, for instance, participants learned 
paired associates, and it is thus possible that 
overlearning may provide better long-term retention 
for different kinds of tasks. For instance, it remains 
unknown whether overlearning is advisable for the 
long-term retention of more abstract skills such as 
mathematical procedures. Fikewise, overlearning may 
provide longer lasting benefits when the material lends 
itself to different kinds of encoding strategies. For 
example, strategies such as visual imagery or deep 
processing are likely precluded by the use of city- 
country pairs and word-definition pairs. Finally, it is 
possible that overlearning would have proven more 
useful at longer retention intervals if retention had been 
assessed by a free recall test or a recognition test. 
Maximizing Recall Total 

Although the studies presented here focused on 
the proportion of items recalled, the measure of recall 
total has more practical significance in some situations. 
When enriching one’s vocabulary, for instance, the 
total number of words learned is arguably more 
important than the proportion of studied items that 
were forgotten. For example, rather than study 25 
vocabulary words and later recall 15, a student who 
devoted the same total study time to 50 words might 
later recall 20. Here, then, the longer study list 
provides a greater boost in vocabulary. 

This hypothetical example is analogous to the 
results of Experiment 2. The 20- word learners were 
able to recall more words than the 10- word learners 
even though total study duration did not vary. Hence, 
although students are often required to master 




relatively short lists of vocabulary words, the use of 
much longer vocabulary lists (and a concomitant 
reduction in expected accuracy) might increase 
students’ vocabulary by a greater number of words. 
Such a strategy would also be appropriate during the 
preparation for a standardized test that includes 
vocabulary words, because these words are usually 
drawn from a list of thousands. 

This raises the question of how many items 
one should study during a session of a given duration 
in order to maximize recall total, which is equivalent to 
determining the optimal study time per item. In 
Experiment 2, recall total was greater for the 20-word 
learners (who averaged 36 s per word) than for the 10- 
word learners (who averaged 66 s per word). Here, 
then, it was better to study more words (and devote less 
time to each word). However, we surmise that the 
optimal study duration per item is not necessarily as 
brief as possible. In a compilation of data from 
Experiments 1 and 2 and a number of additional 
unpublished studies in our laboratory, recall total was 
an inverted U-function of study time per item, meaning 
that recall total was optimized by an intermediate value 
of study time per item (when total study duration 
remains constant). Eor a one-week retention interval, 
for example, this optimum was about 30 s per item. 
Thus, by this preliminary analysis, if students wish to 
increase their vocabulary in preparation for a 
standardized test that is to be taken in one week, the 
number of study words should be that which produces 
an average of about 30 s of study time per word 
(although this duration is probably best distributed 
across multiple presentations). Of course, this value 
would most likely vary with the kind of material, the 
amount of material, and the type of test. 

The Benefits of Distributed Practice 

While the data presented here suggest that the 
benefits of overlearning diminish with time, we are 



certainly not advocating that people should avoid large 
amounts of practice to achieve long-term retention. 
Instead, we suggest that post-criterion practice be 
delayed until a later day in order to reap the benefits of 
the so-called spacing effect. Specifically, by 
distributing (or spacing) practice across multiple 
sessions rather than concentrating (or massing) the 
same amount of practice into one session, long-term 
retention is often boosted dramatically (Austin, 1921; 
Bahrick, Bahrick, Bahrick, & Bahrick, 1993; Bjork, 
1979; Bloom & Schuell, 1981; Cull, 2000; Dempster, 
1989; Earhard & Landry, 1976; Glenberg & Lehman, 
1980; Greene, 1989; Reynolds & Glaser, 1964; 
Schmidt & Bjork, 1992; Shaughnessy, 1977). 

Unfortunately, the strategy of distributed 
practice is underutilized. Eor example, the majority of 
vocabulary words presented in foreign language 
textbooks appear in one chapter rather than multiple 
chapters. Likewise, for each daily lesson within many 
mathematics textbooks, the majority of the following 
exercises concern that day’s lesson. The spacing effect 
also appears to be underappreciated by cognitive 
psychologists, as it is mentioned in only three of the 
twelve introductory cognitive psychology textbooks 
belonging to the first author. 

In fact, the paucity of spacing in educational 
curriculum may partly reflect the popularity of 
overlearning, as the two strategies often conflict in 
applied settings. That is, for a study session of a given 
duration, the overlearning of any particular skill leaves 
less time for the review of previously learned skills. 
Thus, every additional math problem done for the sake 
of overlearning is one less problem devoted to the 
principle of spaced practice. Hopefully, an 
appreciation of the severe limitations of overlearning 
will encourage a greater reliance on the productive 
principle of spaced learning. 
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Figure 1. Experiment 1. The Hi Learners (n = 63) who achieved at least three perfect learning trials were 
duhhed the True Hi Learners (n = 58). The Lo Learners (n = 67) who achieved no more than one perfect trial were the 
True Lo Learners (n = 44). (A) The Hi Learners and Lo Learners underwent the same procedure during the first five 
trials. The dip in the Hi Learners data at Trial 6 reflects a brief rest after Trial 5. (B) Lor both graphs, both main effects 
and the interaction are statistically significant. Error bars indicate plus or minus 1 standard error. 
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Figure 2. Experiment 2. The Hi Learners (n = 46) who achieved at least three perfect learning trials were 
duhhed the True Hi Learners (n = 40). The Lo Learners (n = 42) who achieved no more than one perfect trial were the 
True Lo Learners (n = 41). (A) The Hi Learners (and True Hi Learners) completed 10 learning trials with a 10-word 
list, whereas the Lo Learners (and True Lo Learners) completed 5 learning trials with a 20-word list. Total study time 
did not differ. (B) Lor both graphs, both main effects and the interaction are statistically significant. Error bars indicate 
plus or minus one standard error. 




